AITopics | giant neural network

GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

Neural Information Processing SystemsDec-24-2025, 23:58:00 GMT

Scaling up deep neural network capacity has been known as an effective approach to improving model quality for several different machine learning tasks. In many cases, increasing model capacity beyond the memory limit of a single accelerator has required developing special algorithms or infrastructure. These solutions are often architecture-specific and do not transfer to other machine learning tasks. To address the need for efficient and task-independent model parallelism, we introduce TensorPipe, a pipeline parallelism library that allows scaling any network that can be expressed as a sequence of layers. By pipelining different sub-sequences of layers on separate accelerators, TensorPipe provides the flexibility of scaling a variety of different networks to gigantic sizes efficiently. Moreover, TensorPipe utilizes a novel batch-splitting pipelining algorithm, resulting in almost linear speedup when a model is partitioned across multiple accelerators. We demonstrate the advantages of TensorPipe by training large-scale neural networks on two different tasks with distinct network architectures: (i)Image Classification: We train a 557-million-parameter AmoebaNet model and attain a top-1 accuracy of 84.4% on ImageNet-2012, (ii)Multilingual Neural Machine Translation: We train a single 6-billion-parameter, 128-layer Transformer model on a corpus spanning over 100 languages and achieve better quality than all bilingual models.

efficient training, giant neural network, name change, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.85)

Add feedback

GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

Neural Information Processing SystemsMay-27-2025, 07:56:18 GMT

Scaling up deep neural network capacity has been known as an effective approach to improving model quality for several different machine learning tasks. In many cases, increasing model capacity beyond the memory limit of a single accelerator has required developing special algorithms or infrastructure. These solutions are often architecture-specific and do not transfer to other machine learning tasks. To address the need for efficient and task-independent model parallelism, we introduce TensorPipe, a pipeline parallelism library that allows scaling any network that can be expressed as a sequence of layers. By pipelining different sub-sequences of layers on separate accelerators, TensorPipe provides the flexibility of scaling a variety of different networks to gigantic sizes efficiently.

efficient training, giant neural network, pipeline parallelism, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Reviews: GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

Neural Information Processing SystemsJan-21-2025, 10:29:57 GMT

Originality * Their proposed algorithm has little in the way of surprising conceptual insights, but in this case that is a good thing - the parallelism algorithm is simple, intuitive, and achieves nearly linear throughput increase in the number of accelerators used (hard to expect much more). It's surprising that this hasn't been done before, given that it's such a simple trick that gives such a large speed up for training distributed models. To train their very deep models, the authors also use another few smaller tricks (e.g., clipping logits to mitigate bad gradients) Significance & Quality * General-purpose model parallelism algorithm: The proposed algorithm is applicable to almost any neural network architecture without modification; the authors demonstrate this feature by scaling up state-of-the-art architectures in both computer vision and NLP settings. In machine translation for low-resource languages, these gains seem quite substantial. Clarity: Clearly written, aided by the simplicity of the algorithm.

algorithm, architecture, neural network, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.98)

Add feedback

Reviews: GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

Neural Information Processing SystemsJan-21-2025, 10:29:46 GMT

While the proposed pipelining framework is simple, the reviewers unanimously assert that from an empirical point of view it is an important advance, especially in language modeling tasks.

efficient training, giant neural network, pipeline parallelism, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)

Add feedback

GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

Neural Information Processing SystemsOct-9-2024, 11:54:27 GMT

Scaling up deep neural network capacity has been known as an effective approach to improving model quality for several different machine learning tasks. In many cases, increasing model capacity beyond the memory limit of a single accelerator has required developing special algorithms or infrastructure. These solutions are often architecture-specific and do not transfer to other machine learning tasks. To address the need for efficient and task-independent model parallelism, we introduce TensorPipe, a pipeline parallelism library that allows scaling any network that can be expressed as a sequence of layers. By pipelining different sub-sequences of layers on separate accelerators, TensorPipe provides the flexibility of scaling a variety of different networks to gigantic sizes efficiently.

giant neural network, machine learning, pipeline parallelism, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

New research indicates the whole universe could be a giant neural network

#artificialintelligenceMar-3-2021, 19:40:50 GMT

The core idea is deceptively simple: every observable phenomenon in the entire universe can be modeled by a neural network. And that means, by extension, the universe itself may be a neural network. Vitaly Vanchurin, a professor of physics at the University of Minnesota Duluth, published an incredible paper last August entitled "The World as a Neural Network" on the arXiv pre-print server. It managed to slide past our notice until today when Futurism's Victor Tangermann published an interview with Vanchurin discussing the paper. We discuss a possibility that the entire universe on its most fundamental level is a neural network.

giant neural network, neural network, universe, (7 more...)

#artificialintelligence

Country:

North America > United States > Minnesota > St. Louis County > Duluth (0.26)
North America > United States > Minnesota > Saint Louis County > Duluth (0.26)

Genre: Research Report > New Finding (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

Huang, Yanping, Cheng, Youlong, Bapna, Ankur, Firat, Orhan, Chen, Dehao, Chen, Mia, Lee, HyoukJoong, Ngiam, Jiquan, Le, Quoc V., Wu, Yonghui, Chen, zhifeng

Neural Information Processing SystemsMar-18-2020, 20:18:51 GMT

Scaling up deep neural network capacity has been known as an effective approach to improving model quality for several different machine learning tasks. In many cases, increasing model capacity beyond the memory limit of a single accelerator has required developing special algorithms or infrastructure. These solutions are often architecture-specific and do not transfer to other machine learning tasks. To address the need for efficient and task-independent model parallelism, we introduce TensorPipe, a pipeline parallelism library that allows scaling any network that can be expressed as a sequence of layers. By pipelining different sub-sequences of layers on separate accelerators, TensorPipe provides the flexibility of scaling a variety of different networks to gigantic sizes efficiently.

efficient training, giant neural network, pipeline parallelism, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback